Manipulation of zombie files and evil-twin files

ABSTRACT

The invention provides a method and system for reliably performing extra-long operations in a reliable state-full system (such as a file system). The system records consistency points, or otherwise assures reliability, notwithstanding the continuous performance of extra-long operations and the existence of intermediate states for those extra-long operations. Moreover, performance of extra-long operations is both deterministic and atomic with regard to consistency points (or other reliability techniques used by the system). The file system includes a separate portion of the file system reserved for files having extra-long operations in progress, including file deletion and file truncation. This separate portion of the file system is called the zombie filespace; it includes a separate name space from the regular (“live”) file system that is accessible to users, and is maintained as part of the file system when recording a consistency point. The file system includes a file deletion manager that determines, before beginning any file deletion operation, whether it is necessary to first move the file being deleted to the zombie filespace. The file system includes a zombie file deletion manager that performs portions of the file deletion operation on zombie files in atomic units. The file system also includes a file truncation manager that determines, before beginning any file truncation operation, whether it is necessary to create a complementary file called an “evil twin”. The truncation manager will move all blocks to be truncated from the file being truncated to the evil twin file. The file system includes a zombie file truncation manager that performs portions of the file truncation operation on the evil-twin file in atomic units. An additional advantage provided by the file system is that files having attached data elements, called “composite” files, can be subject to file deletion and other extra-long operations in a natural and reliable manner. The file system moves the entire composite file to the zombie filespace, deletes each attached data element individually, and thus resolves the composite file into a non-composite file. If the non-composite file is sufficiently small, the file deletion manager can delete the non-composite file without further need for the zombie filespace. However, if the non-composite file is sufficiently large, the file deletion manager can delete the non-composite file using the zombie filespace.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] This invention relates to file server systems, including thosefile server systems in which it is desired to maintain reliable filesystem consistency.

[0003] 2. Related Art

[0004] In systems providing file services, such as those including fileservers and similar devices, it is generally desirable for the server toprovide a file system that is reliable despite the possibility of error.For example, it is desirable to provide a file system that is reliablyin a consistent state, regardless of problems that might have occurredwith the file server, and regardless of the nature of the file systemoperations requested by client devices.

[0005] One known method of providing reliability in systems thatmaintain state (including such state as the state of a file system orother set of data structures) is to provide for recording checkpoints atwhich the system is known to be in a consistent state. Such checkpoints,sometimes called “consistency points,” each provide a state to which thesystem can retreat in the event that an error occurs. From the mostrecent consistency point, the system can reattempt each operation toreach a state it was in before the error.

[0006] One problem with this known method is that some operations canrequire substantial amounts of time in comparison with the time betweenconsistency points. For example, in the WAFL file system (as furtherdescribed in the Incorporated Disclosures), operations on very largefiles can require copying or modifying very large numbers of file blocksin memory or on disk, and can therefore take a substantial fraction ofthe time from one consistency point to another. In the WAFL file system,two such operations are deleting very large files and truncating verylarge files. Accordingly, it might occur that recording a consistencypoint cannot occur properly while one of these extra-long operations isin progress.

[0007] The fundamental requirement of a reliable file system is that thestate of the file system recorded on non-volatile storage must reflectonly completed file system operations. In the case of a file system likeWAFL that issues checkpoints, every file system operation must becomplete between two checkpoints. In the earliest versions of the WAFLfile system there was no file deletion manager present, thus very largefiles created a problem as it was possible that such large files couldnot be deleted between the execution of two consistency checkpoints.

[0008] This problem was partially solved in later versions of the WAFLfile system, where a file deletion manager was assigned to perform theoperation of file deletion, and a consistency point manager was assignedto perform the operation of recording a consistency point. The filedeletion manager would attempt to resolve the problem of extra-long filedeletions by repeatedly requesting more time from the consistency pointmanager, thus “putting off” the consistency point manager until alast-possible moment. However, at that last-possible moment, the filedeletion manager would be required to give way to the consistency pointmanager, and allow the consistency point manger to record theconsistency point. When this occurred, the file deletion manager wouldbe unable to complete the file deletion operation. In that earlierversion of the WAFL file system, instead of completing the file deletionoperation, the file deletion manager would move the file to afixed-length “zombie file” list to complete the file deletion operation.At a later time, a zombie file manager would re-attempt the filedeletion operation for those files on the fixed-length zombie file list.

[0009] While this earlier method achieved the general result ofperforming file deletions on very large files, it has the drawbacks thatit is a source of unreliability in the file system. First, the number offiles that could be processed simultaneously as zombie files was fixedin the previous version.

[0010] Second, the file deletion manager and crash recovery mechanismdid not communicate. The file deletion manager did not notify the crashrecovery mechanism that a file was being turned into a zombie and thecrash recovery mechanism was unable to create zombie files. Thus, toallow a checkpoint to be recorded, a long file would have to be turnedinto a zombie. If the system crashed at this point, the crash recoverymechanism might not be able to correctly recover the file system sinceit is unaware that a zombie file should be created and was incapable ofcreating zombie files should the need arise. Similarly, the operationsof the file deletion manager when creating zombie files, and itsoperations in deleting those zombie files, were not recorded innon-volatile storage, and thus could not be “replayed” after recovery toduplicate the operations of the file deletion manager.

[0011] Third, since the file deletion manager and replay mechanism didnot communicate, the free space reported could be inaccurately reported.Attempts to restore state could fail, because the amount of free spacecould be different than that actually available. Attempts to restorestate could also fail because the operations of the file deletionmanager in using zombie files were not recorded in non-volatile storage;as a result, it might occur that other operations performed duringreplay could conflict with the file deletion manager and cause a crash.

[0012] Fourth, the earlier method is non-deterministic in the sense thatit is not assured whether any particular file deletion operation will becompleted before or after a selected consistency point. Moreover, theearlier method does not resolve problems associated with otherextra-long file operations, such as requests to truncate very largefiles to much smaller length.

[0013] Accordingly, it would be advantageous to provide a technique forextra-long operations in a reliable state-full system (such as a filesystem) that is not subject to the drawbacks of the known art.Preferably, in such a technique, those parts of the system responsiblefor recording of consistency points are fully aware of the intermediatestates of extra-long operations, the performance of extra-longoperations is relatively deterministic, and performance of extra-longoperations is atomic with regard to consistency points.

SUMMARY OF THE INVENTION

[0014] The invention provides a method and system for reliablyperforming extra-long operations in a reliable state-full system (suchas a file system). The system records consistency points, or otherwiseassures reliability (such as using a persistent-memory log file),notwithstanding the continuous performance of extra-long operations andthe existence of intermediate states for those extra-long operations.The system provides for replay, after recovery, of those portions ofextra-long operations which were completed, thus assuring that recoveryand replay are consistent with operations of the file deletion managerand the zombie deletion manager. Moreover, performance of extra-longoperations is both deterministic and atomic with regard to consistencypoints (or other reliability techniques used by the system).

[0015] The file system includes a separate portion reserved for fileshaving extra-long operations in progress, including file deletion andfile truncation; this separate portion of the file system is called thezombie filespace. The zombie filespace includes a separate name spacefrom the regular (“live”) file system and is maintained as part of thefile system when recording a consistency point, just like the livefilespace. The live filespace refers to those files that are accessibleto users in normal operation, such as for example those files for whicha path can be traced from a root of a hierarchical namespace. The filesystem includes a file deletion manager that determines, beforebeginning any file deletion operation, whether it is necessary to firstmove the file being deleted to the zombie filespace. The file systemincludes a zombie file deletion manager that performs portions of thefile deletion operation on zombie files in atomic units.

[0016] The file system also includes a file truncation manager. Beforebeginning any file truncation operation, the file truncation managerdetermines whether it is necessary to create a complementary file calledan “evil twin” file, located in the zombie filespace. The truncationmanager will move all blocks to be truncated from the file beingtruncated to the evil twin file. Moving blocks is typically faster andless resource-intensive than deleting blocks. The “evil twin” issubsequently transformed into a zombie file. The file system includes azombie file truncation manager that can then perform truncation of thezombie file asynchronously in atomic units. Furthermore, the number offiles that can be linked to the zombie filespace is dynamic, allowingthe zombie filespace the ability to grow and shrink as required toprocess varying numbers of files.

[0017] An additional advantage provided by the file system is that fileshaving attached data elements, called “composite” files, can be subjectto file deletion and other extra-long operations in a natural andreliable manner. When performing such operations for composite files,the file system moves the entire composite file to the zombie filespace,deletes each attached data element individually, and thus resolves thecomposite file into a non-composite file. If the non-composite file issufficiently small, the file deletion manager can delete thenon-composite file without further need for the zombie filespace.However, if the non-composite file is sufficiently large, the filedeletion manager can delete the non-composite file using the zombiefilespace.

[0018] The invention provides an enabling technology for a wide varietyof applications for reliable systems, so as to obtain substantialadvantages and capabilities that are novel and non-obvious in view ofthe known art. Examples described below primarily relate to reliablefile systems, but the invention is broadly applicable to many differenttypes of systems in which reliability and extra-long operations are bothpresent.

BRIEF DESCRIPTION OF THE DRAWINGS

[0019]FIG. 1 shows a block diagram of a portion of a system using azombie filespace.

[0020]FIG. 2 illustrates a file structure in a system using a zombiefilespace.

[0021]FIG. 3 shows a process flow diagram for file deletion in a methodfor operating a system for manipulation of zombie files and Evil-twinfiles.

[0022]FIG. 4 shows a process flow diagram for file truncation in amethod for operating a system Manipulation of Zombie Files and Evil-TwinFiles.

[0023]FIG. 5 shows a process flow diagram for replaying operations in amethod for operating a system Manipulation of Zombie Files and Evil-TwinFiles.

[0024] Lexicography

[0025] The following terms refer to or relate to aspects of theinvention as described below. The descriptions of general meanings ofthese terms are not intended to be limiting, only illustrative.

[0026] live filespace—This term generally refers to a portion of thefile system where files are available to users in normal operation. In apreferred embodiment, the live filespace includes those modes (or othertypes of file control structure) that are not yet allocated to in-usefiles.

[0027] zombie filespace—This term generally refers to a portion of thefile system where files are not available to users in normal operation,but can still be manipulated by the file system as if they were normalfiles.

[0028] Storage Operating System—in general refers to thecomputer-executable code operable on a storage system that implementsfile system semantics and manages data access. In this sense, ONTAPsoftware is an example of such a storage operating system implemented asa microkemel, with its WAFL layer implementing the file systemsemantics. The storage operating system can also be implemented as anapplication program operating over a general-purpose operating system,such as UNIX® or Windows NT®, or as a general-purpose operating systemwith configurable functionality, which is configured for storageapplications.

[0029] As noted above, these descriptions of general meanings of theseterms are not intended to be limiting, only illustrative. Other andfurther applications of the invention, including extensions of theseterms and concepts, would be clear to those of ordinary skill in the artafter perusing this application. These other and further applicationsare part of the scope and spirit of the invention, and would be clear tothose of ordinary skill in the art, without further invention or undueexperimentation.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

[0030] In the following description, a preferred embodiment of theinvention is described with regard to preferred process steps and datastructures. Embodiments of the invention can be implemented usinggeneral-purpose processors or special purpose processors operating underprogram control, or other circuits, adapted to particular process stepsand data structures described herein. Implementation of the processsteps and data structures described herein would not require undueexperimentation or further invention.

RELATED APPLICATIONS

[0031] Inventions described herein can be used in conjunction withinventions described in the following documents.

[0032] U.S. patent application Ser. No. 09/642,062, Express Mail MailingNo. EL524780242US, filed Aug. 18, 2000, in the name of Rajesh Sundaram,et al., attorney docket number 103.1034.01, titled “Dynamic Data Space.”

[0033] U.S. patent application Ser. No. 09/642,061, Express Mail MailingNo. EL524780239US, filed Aug. 18, 2000, in the name of Blake Lewis etal., attorney docket number 103.1035.01 titled “Instant Snapshot.”

[0034] U.S. patent application Ser. No. 09/642,065, Express Mail MailingNo. EL524781092US, filed Aug. 18, 2000, in the name of Douglas Doucette,et al., attorney docket number 103.1045.01, titled “Improved SpaceAllocation in a Write Anywhere File System.” and

[0035] U.S. patent application Ser. No. 09/642,064, Express Mail MailingNo. EL524781075US, filed Aug. 18, 2000, in the name of Scott SCHOENTHAL,et al attorney docket number 103.1048.01, titled “persistent andreliable Delivery of Event Messages.”

[0036] Each of these documents is hereby incorporated by reference as iffully set forth herein. This application claims priority of each ofthese documents. These documents are collectively referred to as the“Incorporated Disclosures.”

[0037] System Elements

[0038]FIG. 1 shows a block diagram of a portion of a system using azombie filespace.

[0039] A system 100 includes a file server 110 including a processor111, program and data memory 112, a network interface card 115, and massstorage 120.

[0040] The program and data memory 112 include program instructions anddata structures used by a file deletion manager 121, a zombie filedeletion manager 122, a file truncation manager 123, or a zombie filetruncation manager 124.

[0041] The file deletion manager 121 responds to a file server request(such as one received from a user of the file server 110), and performsan operation for deleting a file. As shown herein, the operation fordeleting a file might include transferring the file from a livefilespace 210 (shown in FIG. 2) to a zombie filespace 250 (shown in FIG.2) and performing additional operations on the file in the zombiefilespace 250. The zombie file deletion manager 122 performs theseadditional operations.

[0042] Similarly, the file truncation manager 123 responds to a fileserver request (such as one received from a user of the file server110), and performs an operation for deleting a file. As shown herein,the operation for deleting a file might include transferring the file toa zombie filespace 250 and performing additional operations on the filein the zombie filespace 250. The zombie file truncation manager 124performs these additional operations.

[0043] The network interface card 115 couples the file server 110 to anetwork. In a preferred embodiment, the network includes an Internet,intranet, extranet, virtual private network, enterprise network, oranother form of communication network.

[0044] The mass storage 120 can include any device for storingrelatively large amounts of information, such as magnetic disks ortapes, optical drives, or other types of mass storage.

[0045] File Structure Example

[0046]FIG. 2 illustrates a file structure in a system using a zombiefilespace.

[0047] A file structure 200 includes, a live filespace 210, an inodefile 220, a live file link 230, a file 240, a zombie filespace 250, anda zombie file link 260.

[0048] The live filespace 210 contains a live root block 211 and allassociated blocks of data and metadata for live files. As noted above,“live files” are files in the live filespace, which may be accessed byusers in normal operation.

[0049] The inode file 220 is associated with the file to be deleted andcontains information about the file. The inode file 220 itself ispreferably recorded using a tree structure, in which individual entries221 for files (including their live file links 230) are maintained atleaves of the tree, and in which one or more indirect blocks 222 aremaintained at nodes of the tree to allow the entire inode file 220 to bereached from a root block 223 therefor. Small inode files 220 might notrequire any indirect blocks 222, or might even be stored directly indata blocks for their containing directory.

[0050] The live file link 230, links a file to the live filespace 210.

[0051] Similar to an inode file 220, the file 240 includes a pluralityof file blocks 241, and a plurality of block links 242. The file blocks241 are connected by the plurality of block links 242. The file 240 isillustrative of a file to be deleted. The structure of the file asdefined above is a hierarchical tree-like structure, however, there isno requirement in any embodiment of the invention that the invention beapplied only to file structures (or inode structures) of this type. Theuse of a hierarchical tree-like structure filing system is intended tobe illustrative only and not limiting.

[0052] If the file is a composite file, it has attached data elements243 which are associated with the file 240 (such as possibly by one ormore references from the file's inode file 220).

[0053] The zombie filespace 250 contains a zombie root block 251 and allassociated blocks of data for zombie files (files in the zombiefilespace, which are in the process of being deleted or truncated).

[0054] The zombie file link 260 links a file to be deleted to the zombiefilespace 250. A file that has been linked to the zombie filespace 250is referred to as a “zombie file” while it is so linked. Zombie files inthe zombie filespace 250 are maintained in like manner as live files 240in the live filespace 210.

[0055] Method of Operation—File Deletion

[0056]FIG. 3 shows a process flow diagram for file deletion in a methodfor operating a system for manipulation of zombie files and Evil-twinfiles.

[0057] A method 300 includes a set of flow points and a set of steps.The system 100 performs the method 300. Although the method 300 isdescribed serially, the steps of the method 300 can be performed byseparate elements in conjunction or in parallel, whether asynchronously,in a pipelined manner, or otherwise. There is no particular requirementthat the method 300 be performed in the same order in which thisdescription lists the steps, except where so indicated.

[0058] In this method 300, each operation denoted by a flow point isrecorded in a file system log, such as a persistent memory that can beaccessed in the event of a file system crash or other serviceinterruption. The file system can and does generate checkpoints whilethese operations are being performed. After a crash, the file systemreplays the operations noted in the log, as further described withregard to FIG. 5.

[0059] At a flow point 310, a system user selects the file 240 fordeletion. User interfaces for this activity vary from system to systembut are well known in the art.

[0060] At a flow point 320, the file 240 is identified by the system asa large file requiring zombie processing. In a preferred embodiment, thespecific size of a file necessary to trigger zombie processing isparameter-based, software-selectable, however, it can be any set ofinstructions supporting this functionality, such as instructionshard-coded on a computer chip.

[0061] The file 220 is identified as a large file in response to anamount of time calculated as necessary to delete the file 220. Theamount of time is calculated in response to a number of data blocksincluded in the file, and in response to a size on record for the file.In a preferred embodiment, the file 220 is identified as a large file ifit has more than one indirect block 241, that is, if the file 220 hasmore than about 1,024 data blocks 241. In a preferred embodiment, allcomposite files 220 are also identified as large files for this purpose.

[0062] In alternative embodiments, depending on the underlyingimplementation of the file system and storage operating system, the fileis identified as a large file in response to other metrics of whenextra-long operations can consume too many resources at once, holdresources locked for too long a period of time, or otherwise consume toomuch of a single resource, or some combination thereof, so as tojeopardize correct operation of other parts of the file system andstorage operating system. Examples of such other metrics include anamount of log space, a number of log entries, or some other measure ofunfinished work needed to be completed, that would be used if thedeletion (or truncation) operation is too large.

[0063] At a flow point 325, the file deletion manager 121 determineswhether the zombie filespace 250 needs to be enlarged to accommodateanother zombie file, and if necessary enlarges the zombie filespace.

[0064] In a preferred embodiment, the file deletion manager 121 attemptsto allocate an entry in the zombie filespace 250. If this is possible(that is, at least one entry is available in the zombie filespace 250for use), the file deletion manager 121 can proceed without requestingenlargement of the zombie filespace 250. If there is no entry availablein the zombie filespace 250 for use, the file deletion manager 121requests the file server 110 to enlarge the zombie filespace 250 (suchas by creating another free entry therein), and proceeds to allocate thenewly created free entry for use. If the newly created free entry hasbeen allocated by another process, the file deletion manager 121 repeatsthis flow point until it is able to allocate an entry for its own use.

[0065] At a flow point 330, the link connecting the file 240 to the livefilespace 210 is terminated. At this point the file 240 is no longeravailable to users connected to the file server 110.

[0066] In a preferred embodiment, the file deletion manager 121 alsoalters the generation number of the inode 220 for the file 210, so thatexternal users of the file server 110 can no longer refer to the file210 by file handles they might have kept. Those users will see the file210 as having disappeared (that is, been deleted).

[0067] At a flow point 340, the file 240 is linked to the zombiefilespace 250 via the zombie file link 260. At this point, file 240 isreferred to as a zombie file.

[0068] At a flow point 350, the zombie file deletion manager 122 startsdeleting portions of the file 240 by terminating block links 242 at theouter leaves of the file tree. As file blocks 241 are deleted by thezombie deletion manager 122, they become available for storage of otherdata. This fact is reflected in the free space indicator of the massstorage 120.

[0069] At a flow point 360, the file 240 is deleted. Since the file 240in the zombie filespace 250 has been deleted, this is equivalent tofreeing the inode 220, and any other file system control structure, forthe file 240, and terminating any link between the file 240 and thezombie filespace 250.

[0070] Method of Operation—File Truncation

[0071]FIG. 4 shows a process flow diagram for file truncation in amethod for operating a system Manipulation of Zombie Files and Evil-TwinFiles.

[0072] A method 400 includes a set of flow points and a set of steps.The system 100 performs the method 400. Although the method 400 isdescribed serially, the steps of the method 400 can be performed byseparate elements in conjunction or in parallel, whether asynchronously,in a pipelined manner, or otherwise. There is no particular requirementthat the method 400 be performed in the same order in which thisdescription lists the steps, except where so indicated.

[0073] In this method 400, each operation denoted by a flow point isrecorded in a file system log, such as a persistent memory that can beaccessed in the event of a file system crash or other serviceinterruption. The file system can and does generate checkpoints whilethese operations are being performed. After a crash, the file systemreplays the operations noted in the log, as further described withregard to FIG. 5.

[0074] At a flow point 410, a system user selects the file 240 fortruncation. User interfaces for this activity vary from system to systembut are well known in the art.

[0075] At a flow point 420, the file system (that is, the file systemcomponent of the storage operating system) identifies the amount of thefile to be truncated as requiring evil twin/zombie processing. In thepreferred embodiment, the specific amount of data to be truncatednecessary to trigger evil twin/zombie processing is parameter-basedsoftware-selectable; however, it can be any set of instructionssupporting this functionality, such as instructions hard-coded on acomputer chip. In a preferred embodiment, identification of a file forevil twin processing is similar to identification of a file for zombieprocessing.

[0076] At a flow point 425, the file truncation manager 123 determineswhether the zombie filespace 250 needs to be enlarged to accommodateanother zombie file, and if necessary enlarges the zombie filespace.This flow point is similar to the flow point 325.

[0077] At a flow point 430, an evil twin file is created. At this pointthe file 240 is unavailable to the user. This flow point is similar tothe flow points 330 and 340, except that the original file is notremoved from the live filespace 210.

[0078] At a flow point 440, blocks of data to be truncated are movedfrom the file 240 to the evil twin file. Links associating the datablocks to be truncated from the live file in the live filespace arebroken, and corresponding links associating the same data blocks withthe evil twin file in the zombie filespace are created. This flow pointis similar to the flow points 330 and 340, except that only a subset ofthe data blocks in the original file are removed from the live filespace210 and transferred to the zombie filespace 250.

[0079] At a flow point 450, file attributes for the file 240 areadjusted appropriately (for example, the size of the file, the number ofblocks in the file, and the file's timestamp).

[0080] At a flow point 460, the evil twin file is turned into a zombiefile. It is connected to the zombie filespace. This flow point issimilar to the flow point 340, except that it is the evil twin, not theoriginal file, which is linked to the zombie filespace 250.

[0081] At a flow point 470, the file 240 is marked as available in thelive filespace. At this point the file 240 is available to all users.

[0082] At a flow point 480, the zombie file deletion manager 122 freesall blocks attached to the zombie file.

[0083] At a flow point 490, the zombie file has been deleted and thelink to the zombie filespace 250 is terminated. Since the zombie file inthe zombie filespace 250 has been deleted, this is equivalent to freeingthe inode 220, and any other file system control structure, for thezombie file.

[0084] Method of Operation—Replay

[0085]FIG. 5 shows a process flow diagram for replaying operations in amethod for operating a system Manipulation of Zombie Files and Evil-TwinFiles.

[0086] A method 500 includes a set of flow points and a set of steps.The system 100 performs the method 500. Although the method 500 isdescribed serially, the steps of the method 500 can be performed byseparate elements in conjunction or in parallel, whether asynchronously,in a pipelined manner, or otherwise. There is no particular requirementthat the method 500 be performed in the same order in which thisdescription lists the steps, except where so indicated.

[0087] At a flow point 510, the file server 110 has recovered from acrash or other service interruption.

[0088] At a step 511, the file server 110 examines its log (preferablyrecorded in a persistent memory), and determines which log entriesshould be replayed. In a preferred embodiment, those log entries notmarked in the log as being committed as part of a consistency point arerequired to be replayed. In a preferred embodiment, the log is recordedin a persistent memory and pointed to by at least one link from apersistently recorded file system control block. To quickly determinethis, the file system control block is preferably flagged as being“clean” when the system is shut down normally. When rebooting, thesystem can check each file system to determine if was shut down cleanly.If it was not, then log entries that reflect changes not present in theon-disk file system must be replayed. There are known techniques fordetermining which such log entries. One method is time-stamping when logentries and the file system control block were last updated.

[0089] At a step 512, the file server 110 replays the operationdesignated by each log entry, thus re-performing those operations.

[0090] At an optional (but preferred) step 513, the file server 110generates a checkpoint when all log entries have been replayed.

[0091] At a flow point 520, the file server 110 has both recovered fromthe crash or other service interruption, and replayed all necessary logentries, so normal file server operations can proceed.

[0092] Generality of the Invention

[0093] The invention has general applicability to various fields of use,not necessarily related to the services described above. For example,these fields of use can include one or more of, or some combination of,the following:

[0094] The invention is applicable to all computer systems utilizinglarge files.

[0095] The invention is applicable to all computer systems performinglong-duration operations on files.

[0096] Other and further applications of the invention in its mostgeneral form, will be clear to those skilled in the art after perusal ofthis application, and are within the scope and spirit of the invention.

[0097] Although preferred embodiments are disclosed herein, manyvariations are possible which remain within the concept, scope, andspirit of the invention, and these variations would become clear tothose skilled in the art after perusal of this application.

1. A method of operating a filesystem, said filesystem including a livefilespace accessible to users and a zombie filespace not accessible tousers, said method including recording changes to said zombie filespacein a persistent memory.
 2. A method as in claim 1, including, for adeletion operation on a file in said live filespace, transferring saidfile from said live filespace to said zombie filespace; breaking linksassociating disk blocks with said file in a plurality of steps whilesaid file is associated with said zombie filespace, wherein saidrecording of changes includes recording said breaking of links in aplurality of steps; and altering said live filespace to reflect saiddeletion operation.
 3. A method as in claim 1, including, for atruncation operation on a file in said live filespace, transferring atleast a portion of said file from said live filespace to said zombiefilespace; breaking links associating disk blocks with said file in aplurality of steps while a portion of said file is associated with saidzombie filespace, wherein said recording of changes includes recordingsaid breaking of links in a plurality of steps; and altering said livefilespace to reflect changes associated with said breaking of links. 4.A method as in claim 1, including, for an operation apparent to users assubstantially atomic, performing said operation in a plurality of stepsusing said zombie filespace, wherein said recording changes is performedin said persistent memory for each of said plurality of steps.
 5. Amethod as in claim 1, including, for an operation performed on a filehaving attached data elements, performing said operation using saidzombie filespace.
 6. A method as in claim 1, including, for an operationperformed using said zombie filespace, altering a size of said zombiefilespace during performance of said operation.
 7. A method as in claim1, including, for an operation performed using said zombie filespace,checkpointing said filesystem during performance of said operation.
 8. Amethod as in claim 1, including recording changes to said live filespacein said persistent memory, wherein records of changes to said livefilespace and of changes to said zombie filespace are substantiallyinterspersed.
 9. A method as in claim 1, including replaying a set ofsaid changes in response to said record.
 10. A method as in claim 1,including replaying a set of said changes to said live filespace and tosaid zombie filespace, wherein replay of changes includes substantialinterspersed performance of changes to said live filespace and to saidzombie filespace.
 11. A method as in claim 1, including replaying a setof said changes in said record in response to a crash recovery by saidfilesystem.
 12. A method as in claim 1, wherein said persistent memoryincludes a log of substantially all changes, within a selected timeduration, to either said live filespace or said zombie filespace.
 13. Amethod as in claim 1, wherein said persistent memory includes a log ofsubstantially all changes, within a selected time duration, to saidzombie filespace.
 14. A method as in claim 1, wherein said recordedchanges include a set of substantially atomic operations to said zombiefilespace.
 15. A method of operating a filesystem, said filesystemincluding a live filespace accessible to users and a zombie filespacenot accessible to users, said method including dynamically growing saidzombie filespace.
 16. A method as in claim 15, including, for a deletionor truncation operation on a file in said live filespace, allocatingstorage within said zombie filespace for metadata associated with saidfile; performing said dynamic growth in response to failure of saidallocation of storage; re-performing said allocation of storage aftersaid dynamic growth; and transferring said file from said live filespaceto said zombie filespace.
 17. A method as in claim 15, wherein saiddynamic growth occurs, for an operation performed using said zombiefilespace, during performance of said operation.
 18. A method ofoperating a filesystem, said filesystem including a live filespaceaccessible to users and a zombie filespace not accessible to users, saidmethod including transfer of a file to said zombie filespace beforebreakage of links to blocks in said file, in response to an operation onsaid file, said operation using said zombie filespace.
 19. A method asin claim 18, wherein, for a deletion operation on a file in said livefilespace, said transfer includes creating a link associating said filewith said zombie filespace; and breaking a link associating said filewith said live filespace; and said deletion operation includes breakinglinks associating disk blocks with said file in a plurality of stepswhile said file is associated with said zombie filespace, wherein saidrecording of changes includes recording said breaking of links in aplurality of steps; and altering said live filespace to reflect saiddeletion operation.
 20. A method as in claim 18, wherein, for atruncation operation on a file in said live filespace, said transferincludes creating a link associating at least a portion of said filewith said zombie filespace; and breaking a link associating said portionwith said file in said live filespace; and said truncation operationincludes breaking links associating disk blocks with said file in aplurality of steps while a portion of said file is associated with saidzombie filespace, wherein said recording of changes includes recordingsaid breaking of links in a plurality of steps; and altering said livefilespace to reflect changes associated with said breaking of links. 21.A method of operating a filesystem, said filesystem including a livefilespace accessible to users and a zombie filespace not accessible tousers, said method including transfer of a file to said zombie filespacebefore performing any substantial portion of an operation on said file,said operation using said zombie filespace.
 22. A method as in claim 21,wherein, for a deletion operation on a file in said live filespace, saidtransfer includes creating a link associating said file with said zombiefilespace; and breaking a link associating said file with said livefilespace; and said deletion operation includes breaking linksassociating disk blocks with said file in a plurality of steps onlywhile said file is associated with said zombie filespace, wherein saidrecording of changes includes recording said breaking of links in aplurality of steps; and altering said live filespace to reflect saiddeletion operation.
 23. A method as in claim 21, wherein, for atruncation operation on a file in said live filespace, said transferincludes creating a link associating at least a portion of said filewith said zombie filespace; and breaking a link associating said portionwith said file in said live filespace; and said truncation operationincludes breaking links associating disk blocks with said file in aplurality of steps only while a portion of said file is associated withsaid zombie filespace, wherein said recording of changes includesrecording said breaking of links in a plurality of steps; and alteringsaid live filespace to reflect changes associated with said breaking oflinks.
 24. A method of operating a filesystem, said filesystem includinga live filespace accessible to users and a zombie filespace notaccessible to users, said method including replay of an operation on afile, said operation using said zombie filespace.
 25. A method as inclaim 24, wherein said replay is responsive to a set of recorded changesin a persistent memory; and including, for a deletion operation on afile in said live filespace, transferring said file from said livefilespace to said zombie filespace, and recording said transfer in saidpersistent memory; breaking links associating disk blocks with said filein a plurality of steps while said file is associated with said zombiefilespace, and recording said breaking of links in said persistentmemory in a plurality of steps; and altering said live filespace toreflect said deletion operation, and recording said alteration in saidpersistent memory.
 26. A method as in claim 24, wherein said replay isresponsive to a set of recorded changes in a persistent memory; andincluding, for a truncation operation on a file in said live filespace,transferring at least a portion of said file from said live filespace tosaid zombie filespace, and recording said transfer in said persistentmemory; breaking links associating disk blocks with said file in aplurality of steps while a portion of said file is associated with saidzombie filespace, and recording said breaking of links in saidpersistent memory in a plurality of steps; and altering said livefilespace to reflect changes associated with said breaking of links, andrecording said alteration in said persistent memory.
 27. A method ofoperating a filesystem, said filesystem including a live filespaceaccessible to users and a zombie filespace not accessible to users, saidmethod including replay of a set of filesystem operations, saidoperations including at least some operations using said live filespaceand at least some operations using said zombie filespace.
 28. A methodas in claim 27, wherein said replay is responsive to a set of recordedchanges in a persistent memory; and including, for a deletion operationon a file in said live filespace, transferring said file from said livefilespace to said zombie filespace, and recording said transfer in saidpersistent memory; breaking links associating disk blocks with said filein a plurality of steps while said file is associated with said zombiefilespace, and recording said breaking of links in said persistentmemory in a plurality of steps; and altering said live filespace toreflect said deletion operation, and recording said alteration in saidpersistent memory.
 29. A method as in claim 27, wherein said replay isresponsive to a set of recorded changes in a persistent memory; andincluding, for a truncation operation on a file in said live filespace,transferring at least a portion of said file from said live filespace tosaid zombie filespace, and recording said transfer in said persistentmemory; breaking links associating disk blocks with said file in aplurality of steps while a portion of said file is associated with saidzombie filespace, and recording said breaking of links in saidpersistent memory in a plurality of steps; and altering said livefilespace to reflect changes associated with said breaking of links, andrecording said alteration in said persistent memory.